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Abstract 

We study randomized gossip-based processes in dynamic networks that are motivated by discovery 
processes in large-scale distributed networks like peer-to-peer or social networks. 

A well-studied problem in peer-to-peer networks is the resource discovery problem. There, the 
goal for nodes (hosts with IP addresses) is to discover the IP addresses of all other hosts. In social 
networks, nodes (people) discover new nodes through exchanging contacts with their neighbors (friends). 
In both cases the discovery of new nodes changes the underlying network - new edges are added to 
the network - and the process continues in the changed network. Rigorously analyzing such dynamic 
(stochastic) processes with a continuously self-changing topology remains a challenging problem with 
obvious applications. 

This paper studies and analyzes two natural gossip-based discovery processes. In the push process, 
each node repeatedly chooses two random neighbors and puts them in contact (i.e., "pushes" their mutual 
information to each other). In the pull discovery process, each node repeatedly requests or "pulls" a 
random contact from a random neighbor. Both processes are lightweight, local, and naturally robust due 
to their randomization. 

Our main result is an almost-tight analysis of the time taken for these two randomized processes 
to converge. We show that in any undirected n-node graph both processes take (3(nlog 2 n) rounds to 
connect every node to all other nodes with high probability, whereas Q(nlogn) is a lower bound. In 
the directed case we give an 0(n 2 log n) upper bound and an Q, (n 2 ) lower bound for strongly connected 
directed graphs. A key technical challenge that we overcome is the analysis of a randomized process that 
itself results in a constantly changing network which leads to complicated dependencies in every round. 

Keywords: Random process, Resource discovery, Social network, Gossip-based algorithm, Distributed 
algorithm, Probabilistic analysis 
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1 Introduction 



Many large-scale, real-world networks such as peer-to-peer networks, the Web, and social networks are 
highly dynamic with continuously changing topologies. The evolution of the network as a whole is typ- 
ically determined by the decentralized behavior of nodes, i.e., the local topological changes made by the 
individual nodes (e.g., adding edges between neighbors). Understanding the dynamics of such local pro- 
cesses is critical for both analyzing the underlying stochastic phenomena, e.g., in the emergence of structures 
in social networks, the Web and other real-world networks (6l|27l|28l, and designing practical algorithms 
for associated algorithmic problems, e.g., in resource discovery in distributed networks |[T6l 1241 or in the 
analysis of algorithms for the Web JH QT1 . In this paper, we study the dynamics of network evolution that 
result from local gossip-style processes. Gossip-based processes have recently received significant attention 
because of their simplicity of implementation, scalability to large network size, and robustness to frequent 
network topology changes; see, e.g., |IT2l [2ll [22l [9l [20l [1^1 [26l El [IHl H and the references therein. In partic- 
ular, gossip-based protocols have been used to efficiently and robustly construct various overlay topologies 
dynamically in a fully decentralized manner [18]. In a local gossip-based algorithm (e.g., I0), each node 
exchanges information with a small number of randomly chosen neighbors in each round[j] The randomness 
inherent in the gossip-based protocols naturally provides robustness, simplicity, and scalability. While many 
of the recent theoretical gossip-based work (including those on rumor spreading), especially, the push-pull 
type algorithms ( |[T9l l20l [9J [141 [lOl [15]]) focus on analyzing various gossip-based tasks (e.g., computing 
aggregates or spreading a rumor) on static graphs, a key feature of this work is rigorously analyzing a 
gossip-based process in a dynamically changing graph. 

We present two illustrative application domains for our study. First, consider a P2P network, where 
nodes (computers or end-hosts with IDs/IP addresses) can communicate only with nodes whose IP address 
are known to them. A basic building block of such a dynamic distributed network is to efficiently discover 
the IP addresses of all nodes that currently exist in the network. This task, called resource discovery |fl6l , 
is a vital mechanism in a dynamic distributed network with many applications lfl6l Q} : when many nodes in 
the system want to interact and cooperate they need a mechanism to discover the existence of one another. 
Resource discovery is typically done using a local mechanism iPTolk in each round nodes discover other 
nodes and this changes the resulting network — new edges are added between the nodes that discovered 
each other. As the process proceeds, the graph becomes denser and denser and will finally result in a 
complete graph. Such a process was first studied in |[T6l which showed that a simple randomized process is 
enough to guarantee almost-optimal time bounds for the time taken for the entire graph to become complete 
(i.e., for all nodes to discover all other nodes). Their randomized Name Dropper algorithm operates as 
follows: in each round, each node chooses a random neighbor and sends all the IP addresses it knows. Note 
that while this process is also gossip-based the information sent by a node to its neighbor can be extremely 
large (i.e., of size O(n)). More recently, self- stabilization protocols have been designed for constructing 
and maintaining P2P overlay networks e.g, 0Q31. These protocols guarantee convergence to a desired 
overlay topology (e.g., the SKIP+ graph) starting from any arbitrary topology via local checking and repair. 
For example, the self-stabilizing protocol of [5] proceeds by continuously discovering new neighbors (via 
transitive closure) till a complete graph is formed. Then the repair process is initiated. This can also be 
considered as a local gossip-based process in an underlying virtual graph with changing (added) edges. In 
both the above examples, the assumption is that the starting graph is arbitrary but (at least) weakly connected. 
The gossip-based processes that we study also have the same goal — starting from an arbitrary connected 
graph, each node discovers all nodes as quickly as possible - in a setting where individual message sizes are 

'Gossip, in some contexts (see e.g., 1 19. 20|), has been used to denote communication with a random node in the network, 
as opposed to only a directly connected neighbor. The former model essentially assumes that the underlying graph is complete, 
whereas the latter (as assumed here) is more general and applies even to arbitrary graphs. The local gossip process is typically more 
difficult to analyze due to the dependences that arise as the network evolves. 
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small (O(logn) bits). 

Second, in social networks, nodes (people) discover new nodes through exchanging contacts with their 
neighbors (friends). Discovery of new nodes changes the underlying network — new edges are added to 
the network — and the process continues in the changed network. For example, consider the Linkedln 
networlQ a large social network of professionals on the Web. The nodes of the network represent people 
and edges are added between people who directly know each other — between direct contacts. Edges are 
generally undirected, but Linkedln also allows directed edges, where only one node is in the contact list of 
another node. Linkedln allows two mechanisms to discover new contacts. The first can be thought of as a 
triangulation process (see Figure [jja)): A person can introduce two of his friends that could benefit from 
knowing each other — he can mutually introduce them by giving their contacts. The second can be thought 
of as a two-hop process (see Figure [jjb)): If you want to acquire a new contact then you can use a shared 
(mutual) neighbor to introduce yourself to this contact; i.e., the new contact has to be a two-hop neighbor 
of yours. Both the processes can be modeled via gossip in a natural way (as we do shortly below) and the 
resulting evolution of the network can be studied: e.g., how and when do clusters emerge? how does the 
diameter change with time? In the social network context, our study focuses on the following question: how 
long does it take for all the nodes in a connected induced subgraph of the network to discover all the nodes 
in the subgraph? This is useful in scenarios where members of a social group, e.g., alumni of a school, 
members of a club, discover all members of the group through local gossip operations. 




Figure 1: (a) Push discovery or triangulation process, (b) Pull discovery or two-hop walk process, (c) Non- 
monotonicity of the triangulation process - the expected convergence time for the 4-edge graph exceeds that 
for the 3-edge subgraph. 



Gossip-based discovery. Motivated directly by the above applications, we analyze two lightweight, ran- 
domized gossip-based discovery processes. We assume that we start with an arbitrary undirected connected 
graph and the process proceeds in synchronous rounds. Communication among nodes occurs only through 
edges in the network. We further assume that the size of each message sent by a node in a round is at most 
O(logn) bits, i.e., the size of an ID. 

1. Push discovery (triangulation): In each round, each node chooses two random neighbors and con- 
nects them by "pushing" their mutual information to each other. In other words, each node adds an 
undirected edge between two of its random neighbors; if the two neighbors are already connected, 
then this does not create any new edge. Note that this process, which is illustrated in Figure [TJ a), is 
completely local. To execute the process, a node only needs to know its neighbors; in particular, no 
two-hop information is needed. Note that this is similar in spirit to the triangulation procedure of 
Linkedin described earlier, i.e., a node completes a triangle with two of its chosen neighbors. ^ 

2. Pull discovery (two-hop walk): In each round, each node connects itself to a random neighbor 

Ihttp : / /www . linkedin . com 

3 However, we note that in our process the two neighbors are chosen randomly, unlike in Linkedln. 
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of a neighbor chosen uniformly at random, by "pulling" a random neighboring ID from a random 
neighbor. Alternatively, one can think of each node doing a two-hop random walk and connecting to 
its destination. This process, illustrated in Figure [TJb), can also be executed locally: a node simply 
asks one of its neighbors v for an ID of one of v's neighbors and then adds an undirected edge to the 
received contact. Note that this is similar in spirit to the two-hop procedure of Linkedln described 
earlier 

Both the above processes are local in the sense that each node only communicates with its neighbors 
in any round, and lightweight in the sense that the amortized work done per node is only a constant per 
round. Both processes are also easy to implement and generally oblivious to the current topology structure, 
changes or failures. It is interesting also to consider variants of the above processes in directed graphs. In 
particular, we study the two-hop walk process which naturally generalizes in directed graphs: each node 
does a two-hop directed random walk and adds a directed edge to its destination. We are mainly interested 
in the time taken by the process to converge to the transitive closure of the initial graph, i.e., till no more 
new edges can be added. 

Our results. Our main contribution is an analysis of the above gossip-based discovery processes in both 
undirected and directed graphs. In particular, we show the following results (the precise theorem statements 
are in the respective sections.) 

• Undirected graphs: In Sections [3] and |4| we show that for any undirected n-node graph, both the 
push and the pull discovery processes converge in O (re log 2 re) rounds with high probability. We 
also show that f2(nlogn) is a lower bound on the number of rounds needed for almost any re-node 
graph. Hence our analysis is tight to within a logarithmic factor. Our results also apply when we 
require only a subset of nodes to converge. In particular, consider a subset of k nodes that induce 
a connected subgraph and run the gossip-based process restricted to this subgraph. Then by just 
applying our results to this subgraph, we immediately obtain that it will take Oik log 2 k) rounds, with 
high probability (in terms of k), for all the nodes in the subset to converge to a complete subgraph. As 
discussed above, such a result is applicable in social network scenarios where all nodes in a subset of 
network nodes discover one another through gossip-based processes. 

• Directed graphs: In Section[5] we show that the pull process takes 0(n 2 log re) time for any re-node 
directed graph, with high probability. We show a matching lower bound for weakly connected graphs, 
and an fi(re 2 ) lower bound for strongly connected directed graphs. Our analysis indicates that the 
directionality of edges can greatly impede the resource discovery process. 

Applications. The gossip-based discovery processes we study are directly motivated by the two scenarios 
outlined above, namely algorithms for resource discovery in distributed networks and analyzing how dis- 
covery process affects the evolution of social networks. Since our processes are simple, lightweight, and 
easy to implement, they can be used for resource discovery in distributed networks. The Name Dropper 
discovery algorithm has been applied to content delivery systems lTTol . As mentioned earlier, Name Dropper 
and other prior algorithms for the discovery problem |[T6*1 1241 |23l [Q complete in polylogarithmic number 
of rounds (0(log 2 n) or O(logn)), but may transfer 0(re) bits per edge per round. As a result, they may 
not be scalable for bandwidth and resource-constrained networks (e.g., peer-to-peer, mobile, or sensor net- 
works). One approach to use these algorithms in a bandwidth-limited setting (0(log re)-bits per message) is 
to spread the transfer of long messages over a linear number of rounds, but this requires coordination and 
maintaining state. In contrast, the "stateless" nature of the gossip processes we study and the fact that the 

4 Again, one difference is that in the process we analyze the particular each node in the two-hop walk is chosen uniformly at 
random from the appropriate neighborhood. 
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results apply to any initial graph make the process attractive in unpredictable environments. Our analyses 
can also give insight into the growth of real-social networks such as Linkedln, Twitter, or Facebook, that 
grow in a decentralized way by the local actions of the individual nodes. In addition to the application of 
discovering all members of a group, analyses of the processes such as the ones we study can help analyze 
both short-term and long-term evolution of social networks. In particular, it can help in predicting the sizes 
of the immediate neighbors as well as the sizes of the second and third-degree neighbors (these are listed for 
every node in Linkedln). An estimate of these can help in designing efficient algorithms and data structures 
to search and navigate the social network. 

Technical contributions. Our main technical contribution is a probabilistic analysis of localized gossip- 
based discovery in arbitrary networks. While our processes can be viewed as graph-based coupon collection 
processes, one significant distinction with past work in this area (HE] US is that the graphs in our processes 
are constantly changing. The dynamics and locality inherent in our process introduces nontrivial dependen- 
cies, which makes it difficult to characterize the network as it evolves. A further challenge is posed by the 
fact that the expected convergence time for the two processes is not monotonic; that is, the processes may 
take longer to converge starting from a graph G than starting from a subgraph H of G. Figure [jjc) presents 
a small example illustrating this phenomenon. This seemingly counterintuitive phenomenon is, however, 
not surprising considering the fact that the cover time of random walks also share a similar property. One 
consequence of these hurdles is that analyzing the convergence time for even highly specialized or regular 
graphs is challenging since the probability distributions of the intermediate graphs are hard to specify. Our 
lower bound analysis for a specific strongly connected directed graph in Theorem 15 illustrates some of the 
challenges. In our main upper bound results (Theorems [8] and [T2|) , we overcome these technical difficulties 
by presenting a uniform analysis for all graphs, in which we study different local neighborhood structures 
and show how each leads to rapid growth in the minimum degree of the graph. 



2 Preliminaries 

In this section, we define the notations used in our proofs, and prove some common lemmas for Section [3] 
and Section |4] Let G denote a connected graph, d(u) denote the degree of node u, and N l (u) denote the 
set of nodes that are at distance i from u. Let 5 denote the minimum degree of G. We note that G, d{u), 
and N l (u) all change with time, and are, in fact, random variables. For any nonnegative integer t, we use 
subscript t to denote the random variable at the start of round t; for example Gt refers to the graph at the 
start of round t. For convenience, we list the notations in Table [T] 



Table 1 : Notation table 



Notation 


description 


St 

Nt (u) 
\Nt(u)\ 
<k (u) 

d t {u,Ni(v)) 


minimum degree of graph Gt 

set of nodes that are at distance i from u in Gt 

number of nodes in Nf (u) 

degree of node u in Gt 

number of edges from u to nodes in iV| (v), i.e., degree induced on Nf (v) 



We state two lemmas that are used in the proofs in Section [3] and Section [4j Lemma [T] gives a lower 
bound on the number of neighbors within distance 4 for any node u in Gt while Lemma [2] is a standard 
analysis of a sequence of Bernoulli experiments and can be proved by a direct coupon collector argument or 
using a Chernoff bound. The proofs are included in Appendix [A] for completeness. 
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Lemma 1. |uf =1 iV|(u)| > min {2S t , n — l}for all u in G t . 

Lemma 2. Consider k Bernoulli experiments, in which the success probability of the ith experiment is at 
least i/m where m > k. If X% denotes the number of trials needed for experiment i to output a success and 
X = J2i=i Xi> men P r [X > (c+ l)relnn] is less than l/n c . 

3 Proofs for the triangulation process 

In this section, we analyze the triangulation process on undirected connected graphs, which is described by 
the following simple iteration: In each round, for each node u, we add edge (v, w) where v and w are drawn 
uniformly at random from (u). The triangulation process yields the following push-based resource 
discovery protocol. In each round, each node u introduces two random neighbors v and w to one another. 
The main result of this section is that the triangulation process transforms an arbitrary connected n-node 
graph to a complete graph in 0(n log 2 n) rounds with high probability. We also establish an Vt(n\ogn) 
lower bound on the triangulation process for almost all n-node graphs. 

3.1 Upper bound 

We obtain the 0(nlog 2 n) upper bound by proving that the minimum degree of the graph increases by a 
constant factor (or equals n — 1) in 0(n log n) steps. Towards this objective, we study how the neighbors 
of a given node connect to the two-hop neighbors of the node. We say that a node v is weakly tied to a set 
of nodes S if v has less than So/2 edges to S (i.e., dt (v, S) < do/2), and strongly tied to S if v has at least 
8o/2 edges to S (i.e., dt (v, S) > So/2). (Recall that 5q is the minimum degree at start of round 0.) 

Lemma 3. If So < dt{u) < (1 + 1/4) <5o and w G ./Vq («) is strongly tied to Nf (u), then the probability 
that u connects to a node in N% (u) through w in round t is at least 2/ (In). 

Proof. Since w is strongly tied to Nj* (u), dt (w, Nf (uj) > So/2. Therefore, the probability that u connects 
to a node in Nf (u) through w in round t is 

dt{w,N?(u)) 1 > d t {w,N?{u)) 1 > d t (w,N?(u)) 1 

d t (w) d t (w) ~ d t (w) n ~ \N\ (u) \ + d t (w, Nf («)) n 

> dt(w,N?(u)) 1 > Sp/2 1 2_ 

~ {l + l/A)So + d t {w,Nf{u))'n ~ (1 + 1/4)<5 + S /2 ' n In 

□ 

Lemma 4. If So < d t (u) < (1 + l/4)<5 , w £ Nq (u) is weakly tied to iV 2 (u), and v £ Nq (u) n Nq (w), 
then the probability that u connects to v through w in round t is at least 1/(4<5q). 

Proof. Since w is weakly tied to iV 2 (u) and d t (w), is at most \N^ (u) \ + dt (w, iV 2 (u)), we obtain that 
dt (w) is at most (1 + l/4)So + Sq/2. Therefore, the probability that u connects to v through w in round t is 



" d t (w) 2 ~ ((1 + l/4)<5 + So/2) 2 ~ (75 /4) 2 " 4<5 2 ' 

□ 

For analyzing the growth in the degree of a node u, we consider two overlapping cases. The first case 
is when more than <5o/4 nodes of N} (u) are strongly tied to iV 2 (u), and the second is when less than So/ 3 
nodes of iV/ (u) are strongly tied to Nf (u). The analysis for the first case is relatively straightforward: 
when several neighbors of a node u are strongly tied to u's two-hop neighbors, then their triangulation steps 
connect u to a large fraction of these two-hop neighbors. 
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Lemma 7: When all neighbors are 
weakly tied to two-hop neighbors. 



Lemma 6: When many neighbors are weakly 
tied to two-hop neighbors. But there exists 
one strongly tied to two-hop neighbors. 



Lemma 5: When several neighbors are 
strongly tied to two-hop neighbors. 



Minimum degree of \ 
the graph increases j 
by a constant factor 



The triangulation 
process completes in 
0(n log 2 n) rounds. 



Figure 2: This figure illustrates the different cases and relations between lemmas used in the proof of Theorem]!] The 
shaded nodes in N{ (it) are strongly tied to iV t 2 (it). Others are weakly tied to N? (it). 



Lemma 5 (When several neighbors are strongly tied to two-hop neighbors). There exists T = 0(n log n) 
such that if more than 5o/4 nodes in (u) are strongly tied to iV 2 (u) for all t < T, then d^ (u) > 
(1 + 1/4) So with probability at least 1 — 1/n 2 . 

Proof. If at any round t < T, dt {u) > (1 + 1/4) So, then the claim of the lemma holds. In the remainder 
of this proof, we assume d t (u) < (1 + 1/4) So for all t < T. Let w G N} (u) be a node that is strongly tied 
to Nf (u). By Lemma |3] we know that 

2 1 

Pr \u connects to a node in N? (u) through w in round t] > — > — 
L " J in 6n 

We have more than <5o/4 such w's in TVf (u), each of which independently executes a triangulation step in 
any given round. Consider a run of T\ = 72n In n/5o rounds. This implies at least I8n In n attempts to add 
an edge between u and a node in Nf (u). Thus, 

\ 18n In n ^ 
1 ) > 1 - e - 31n " = 1 - 

6n J n d 

If a node that is two hops away from u becomes a neighbor of u by round t, it is no longer in Nf (u). 
Therefore, in T = T\8q/A = 0(n log n) rounds, u will connect to at least <5q/4 new nodes with probability 
at least 1-1 /n 2 , i.e., d T (u) > (1 + 1 /4) S . □ 

We next consider the second case where less than 60/ 3 neighbors of a given node u are strongly tied 
to the two-hop neighborhood of u. This case is more challenging since the neighbors of u that are weakly 
tied may not contribute many new edges to u. We break the analysis of this part into two subcases based on 
whether there is at least one neighbor of u that is strongly tied to A^q (it). Figure [2] illustrates the different 
cases and lemmas used in the proof of Theorem [8] 

Lemma 6 (When few neighbors are strongly tied to two-hop neighbors). There exists T = 0(n log n) 
such that if less than Sq/3 nodes in (u) are strongly tied to (u)for all t < T, and there exists a node 
vq G -/Vq (u) that is strongly tied to Nq (u), then dr (u) > (1 + 1/8) So with probability at least 1 — 1/n 2 . 
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Proof. If at any point t < T, dx (u) > (1 + 1/8) So, then the claim of the lemma holds. In the remainder 
of this proof, we assume d T (u) < (1 + 1/8) S for all t < T. Let 5 t ° denote the set of fo's neighbors in 
iV 2 (u) which are strongly tied to N£ (u) at round t, W® denote the set of vq's neighbors in Nf (u) which 
are weakly tied to (u) at round t. 

Consider any node v in S®. Less than <5o/3 nodes in (u) are. strongly tied to Nf (u), thus more than 
8o/2 — 5q/3 = 5q/6 neighbors of v in N} (u) are weakly tied to iV t 2 (u). Let w be one such weakly tied 
node. By Lemma|4j the probability that u connects to v through w in round t is at least 1 / (4<5q). We have at 
least 5o/6 such ui's, each of which executes a triangulation step each round. Consider T = 72So In n rounds 
of the process. Then the probability that u connects to v in T rounds is at least 



12<5glnn , 
> 1 _ e -31nn 1 _ _±_ 

46% J ~ n3 



Thus, if \S®\ > Sq/8, in an additional O(nlogn) rounds, dr (u) > (1 + 1/8) <$o with probability at least 
1-1/n 2 . 

Therefore, in the remainder of the proof we consider the case where \S$\ < 5 /8. Define R° t = R%_ % U 
W®, Rq = Wq. If at least 5q/8 nodes in R$ are connected to u at any time, then the claim of the lemma 
holds. Thus, in the following we consider the case where \R^ n N$ (u) \ < So/8. From the definition of R^, 
we can derive 

\R°t\ > \W t °\ = d t (v , N? («)) - \S Q t \ > d t (v , («)) - So/8 

At round 0, vo is strongly tied to Nq (u), i.e., do (vo, Nq (u)) > So/2. Since So < d t (u) < (1 + 1/8)Sq, 
we have 

d t {vo, («)) > d t {vo, iV 2 («)) - So/8 > 3S /8 
Let e\ denote the event {u connects to a node in R^ \ (u) through vq in round i}. 



Pr [ex] 



\R*\N t l (u)\ 1 \B$\-\B2nN}(u)\ 1 



> 



dt {vo) d t {v ) d t (v ) d t (v ) 

\R°t\-\R t nNHn)\ 1 = \R°t\-\R t nNHu)\ 1 
dt {vo) n \N\ (u)| + d t {v , N? («)) n 



> \R° t \- Sq/8 1 > d t (^ ,iV t 2 ^)) -6q/8-6q/8 1 
" I JVi («) I + d( (^o, iV 2 («)) ' n ~ \N}{u)\+ d t {vo, N? («)) ' n 

> 3cW8 ~ fr/8 - ^0/8 1 > ZSp/8 - go/8 - Sq/8 I _J_ 

\Nl (u)\+3S /8 ' n ~ (1 + 1/8)S + 3S /8 ' n Yin 

Let X\ be the number of rounds it takes for ei to occur. When ei occurs, let i>i denote a witness for e%; i.e., 
if we use X\ to denote the round at which e\ occurs, then let v\ denote a node in R\ \ Nj Ci (u) to which 
u connects through vo in round X\. Since v\ is in R x , it is also in Wj^ for some t\ < X\\ therefore, v\ 
is strongly tied to (u) n iV£ (u). If (ui, iV 2 (u)) < 3<5 / 8 at any point t, then dj (u) > (1 + 1/8)S . 
Thus, in the remainder of the proof, we consider the case where dt A^ 2 («)) > 3<5o/8. Let (resp., 
Wj 1 ) denote the set of v±'s neighbors in iV t 2 (tt) that are strongly tied (resp., weakly tied) to N£ (u). If 
\Sl I > <5o/ 8, then as we did for the case | S® \ > <5q/ 8, we argue that in 0(n log n) rounds, the degree of u is 
at least (1 + 1/8)Sq with probability at least 1 — 1/n 2 . 

Thus, in the remainder, we assume that |5 t x | < So/8. Define R\ = Rj_ 1 U Wf, R\ x = W^. Let e 2 
denote the event {n connects to a node in R\ \ N\ {u) (or R\ \ {u)) through t>o(or v\) in round t}. By 
the same calculation as for vq, we have Pr [e%] > l/6n. Similarly, we can define es, X3, e\, X4, X$ /±, 
and obtain that Pr [ej > i/{l2n). The total number of rounds for u to gain So/ '4 edges is bounded by 
T = J2i X{. By Lemma u\ T < 36nlnn with probability at least 1 — 1/n 2 , completing the proof. □ 
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Lemma 7 (When all neighbors are weakly tied to two-hop neighbors). There exists T = 0(n log n) such 
that if all nodes in (it) are weakly tied to Nf (it) for all t < T, then dx (u) > min {(1 + l/8)So,n — 1} 
with probability at least 1 — 1 / n 2 . 

Proof. If at any point t < T, dt (it) > min{(l + l/8)5o,n — 1}, then the claim of this lemma holds. In 
the remainder of this proof, we assume d t (u) < min {(1 + l/8)5o,n — 1} for all t < T. In the following, 
we first show, any node v £ Nq (it) will have at least <5o /4 edges to Nj> (u) , where T\ = 0(n log n) . After 
that, v will connect to it in T2 = O(nlogre) rounds. Therefore, the total number of rounds used for v to 
connect to u is T3 = T% + T2 = 0(n log n). 

Node v at least connects to one node in Nq 1 (it). Call it w\. Because all nodes in A 7 "/ (it) are weakly tied to 
Nf (it), we have d t (101, A^ 1 (it)) > S - So/2 = S /2. If d t (wi, JV/ (it) \ AT/ (v)) < So/ 4, then v already 
has <5o/4 edges to (it). Thus, in the following we consider the case where dt (101, A^ 1 (u) \ AT* (u)) > 
So /A. Let ei denote the event {1? connects to a node in AT/ (it) \ N^ (v) through iui}. 

d, ( Wl ,NHu)\NHv)) 1 > dt( Wl ,N?(u)\N}(v)) 1 
1 [ClJ dt (wi) ' dt M " | A? (u)| + d t {wi,N? (u)) ' d t (wx) 

So/4: 12 1 1 

~ (l + l/8)«J + *o/2 ' dt(wi) ~ 13 ' n > 7n" 

Let Xi be the number of rounds needed for e\ to occur. When e\ occurs, let w% denote a witness for e\, 
i.e., let W2 denote a vertex in A"/ (u) \ N} (v) to which v connects. Note that here the value of t is the 
round at which the event occurs. By our choice, 1V2 is also weakly tied to A 7 " 2 (u). By an argument similar 
to the one in the above paragraph, we have d t (w2,N^ (u) \ A^ 1 (y)) > So/ 4. Let e2 denote the event 
\v connects to a node in A 7 "/ (it) through w\ or^}- We have Pr fa] > 2/(7n). Let X2 be the number of 
rounds needed for e2 to occur. Similarly, we can define 63,^3, . . . , eg u, Xg Q u and show Pr [e^j > i/(7n). 
Set T\ = Y^i Xfo which is the bound on the number of rounds needed for v to have at least So/ 4 neighbors 
in N£ (u). By Lemma[2] T2 < 28nlnn with probability at least 1 — 1/n 3 . Now we show v will connect to 
it in T2 rounds after this. Notice that, all w^s are still weakly tied to A 7 " 2 (it). By Lemma [4J the probability 
that it connects to v through wi in round t is at least 1/(4<5q). We have w\, W2, ■ ■ ■ , w^/^ independently 
executing a triangulation step each round. Consider T2 = 48Sq In n rounds of the process. Then, 

/ l \ 125g Inn 1 

Pr [it connects to v in T2 rounds] > 1 — ( l -k ) > 1 5- . 

We have shown for any node v £ Nq (u), it will connect to u in round T3 = Tj + T2 with probability 
at least 1 — 1/n 3 . This implies in round T3, it will connect to all nodes in Nq (it) with probability at 
least 1 - |AT 2 (u)| /n 3 . Then, A" 2 (it) C N Tg (u) , A" 3 (it) C A 7 ^ (u) U AT 2 g (it) , N$ (it) C N Ts (it) U 
N Ts (u) U Nj, (it). We apply the above analysis twice, and obtain that in round T = 3T3 = 0(n log n), 
Nq (u) U A/q (it) U Atf (it) C AT| (it) with probability at least 1 — \Nq (it) U A^ (it) U Nfi (it)| /n 3 > 
1 - 1/n 2 . By Lemmafl] |A^ 2 (it) U Nq (u) U N$ (it)| > min{25 ,n - 1}, thus completing the proof. □ 

Theorem 8 (Upper bound for triangulation process). For any connected undirected graph, the triangu- 
lation process converges to a complete graph in 0(n log 2 n) rounds with high probability. 

Proof. We first show that in 0(n log n) rounds, either the graph becomes complete or its minimum degree 
increases by a factor of at least 1 / 12. Then we apply this argument 0(log n) times to complete the proof. 

For each u where do (it) < min {(1 + 1/8)#q, n — 1}, we consider the following 2 cases. The first case 
is if more than <5o/3 nodes in N l (it) are strongly tied to Nq (it). By Lemma [5] there exists T = 0{n log n) 
such that if at least S /4 nodes in AT/ (it) are strongly tied to N? (it) for t < T, then d T (u) > (1 + l/8)8 
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with probability at least 1 — l/n 2 . Whenever the condition is not satisfied, i.e., less than #o/4 nodes in 
iV/ (u) are strongly tied to N 2 (u), it means more than So/3 — So /A = So/12 strongly tied nodes became 
weakly tied. By the definitions of strongly tied and weakly tied, this implies cIt (u) > (1 + l/12)5o. 

The second case is if less than So/ 3 nodes in Nq (u) are strongly tied to Nq (u). By Lemmas [6] and [7] 
we know that there exists T = 0(n log n) such that if we remain in this case for T rounds, then dr {u) > 
min{(l + l/8)So,n — 1} with probability at least 1 — l/n 2 . Whenever the condition is not satisfied, i.e., 
more than So/ 3 nodes in iV t (u) are strongly tied to N 2 (u), we move to the analysis in the first case, and 
dr (u) > (1 + 1/8)^0 in T = 0{n logn) rounds with probability at least 1 — l/n 2 . 

Combining the above 2 cases and applying a union bound, we obtain St > min {(1 + l/8)<5o,n — 1} 
in T = O(nlogn) rounds with probability at least 1 — l/n. We now apply the above argument O(logn) 
times to obtain the desired 0(n log 2 n) upper bound. □ 



3.2 Lower bound 

The proof of the following theorem is in Appendix |B] 

Theorem 9 (Lower bound for triangulation process). For any connected undirected graph G that has 
k > 1 edges less than the complete graph the triangulation process takes SI (n log k) steps to complete with 

probability at least 1 — ( e _fcl/4 ). 



4 The two-hop walk: Discovery through pull 

In this section, we analyze the two-hop walk process on undirected connected graphs, which is described 
by the following simple iteration: In each round, for each node u, we add edge (u,w) where w is drawn 
uniformly at random from N/ (y), where v is drawn uniformly at random from (it). The two-hop walk 
yields the following pull-based resource discovery protocol. In each round, each node u contacts a random 
neighbor v, receives the identity of a random neighbor w of v, and sends its identity to w. The main result of 
this section is that the two-hop walk process transforms an arbitrary connected n-node graph to a complete 
graph in 0(n log 2 n) rounds with high probability. We also establish an Cl (n logn) lower bound on the 
two-hop walk for almost all n-node graphs. 

As for the triangulation process, we establish the 0(n log 2 n) upper bound by showing that the mini- 
mum degree of the graph increases by a constant factor (or equals n — 1) in O(nlogn) rounds with high 
probability. For analyzing the growth in the degree of a node u, we consider two overlapping cases. The 
first case is when the two-hop neighborhood of u is not too large, i.e., jA" 2 (u) | < So/2, and the second is 
when the two-hop neighborhood of u is not too small, i.e., [A^ 2 (u) \ > So/ 4:. The proofs of the following 
three claims that establish the upper bound are deferred to Appendix [C] 

Lemma 10 (When the two-hop neighborhood is not too large). There exists T = O(nlogn) such that 
either \N T (it) | > So/ 2 or dj- (u) > min {2 Jo, n — 1} with probability at least 1 — l/n 2 . 

Lemma 11 (When the two-hop neighborhood is not too small). There exists T = O(nlogn) such that 
either \N T (it)| < So/A or dx {u) > min{(l + 1/8)Sq, n — 1}, with probability at least 1 — l/n 2 . 

Theorem 12 (Upper bound for two-hop walk process). For connected undirected graphs, the two-hop 
walk process completes in 0(n log 2 n) rounds with high probability. 

The proof of Theorem[l3]is essentially the same as that for Theorem[9j and is omitted. 

Theorem 13 (Lower bound for two-hop walk process). For any connected undirected graph G that has 
k > 1 edges less than the complete graph the two-hop process takes f2(nlogA;) steps to complete with 

probability at least 1 — ( e~ fcl/4 ). 
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5 Two-hop walk in directed graphs 



In this section, we analyze the two-hop walk process in directed graphs. We say that the process terminates 
at time t if for every node u and every node v, Gt contains the edge (it, v) whenever u has a path to v in Go. 

Theorem 14. On any n-node directed graph, the two-hop walk terminates in 0(n 2 logn) rounds with high 
probability. Furthermore, there exists a (weakly connected) directed graph for which the process takes 
f2(n 2 log n) rounds to terminate. 

The lower bound in the above theorem, whose proof is deferred to Appendix [Dj takes advantage of 
the fact that the initial graph is not strongly connected. Extending the above analysis for strongly connected 
graphs appears to be much more difficult since the events corresponding to the addition of new edges interact 
in significant ways. We present an Q(n 2 ) lower bound for a strongly connected graph by a careful analysis 
that tracks the event probabilities with time and takes dependencies into account. The graph Go, depicted 
in Figure [5] is similar to the example in [ 16] used to establish an Q(n) lower bound on the Random Pointer 
Jump algorithm, in which each node gets to know all the neighbors of a random neighbor in each step. Since 
the graphs are constantly changing over time in both the processes, the dynamic edge distributions differ 
significantly in the two cases, and we need a substantially different analysis. Due to space constraints, we 
defer the proof to Appendix [Dj 

Theorem 15. There exists a strongly connected directed graph for which the expected number of rounds 
taken by the two-hop process is Q(n 2 ). 



Clique 




Figure 3: Lower bound example for two-hop walk process in directed graphs 



6 Conclusion 

We have analyzed two natural gossip-based discovery processes in networks and showed almost-tight bounds 
on their convergence in arbitrary networks. Our processes are motivated by the resource discovery problem 
in distributed networks as well as by the evolution of social networks. We would like to study variants of 
the processes that take into account failures associated with forming connections, the joining and leaving of 
nodes, or having only a subset of nodes to participate in forming connections. We believe our techniques can 
be extended to analyze such situations as well. From a technical standpoint, the main problem left open by 
our work is to resolve the logarithmic factor gap between the upper and lower bounds. It is not hard to show 
that from the perspective of increasing the minimum degree by a constant factor, our analysis is tight up to 
constant factors. It is conceivable, however, that a sharper upper bound can be obtained by an alternative 
analysis that uses a "smoother" measure of progress. 
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A Proofs for Section |2] 



Proof of Lemma Q} 

If iV t 3 (u) is not an empty set, consider node v € Nf (u). Since dt (v) > 5t, we have \uf =2 Nl (u)\ > 

> 5t because dt (u) > St- We also know N£ (u) and Lif =2 Nl (u) are disjoint. Thus, 

> 25t- If Nf (u) is an empty set, then iV t (u) U Nf (u) = n — 1 because Gt is connected. 



|uf=i^(«) 



Thus \uf =1 Ni (u) | = n — 1. Combine the above 2 cases, we complete the proof of this lemma. □ 
Proof of Lemma [TJ 

Since X only increases with k, with out loss of generality assume that k = m. Now we can view this as 
coupon collector problem l25ll where X m+ i_ j is the number of steps to collect the ith coupon. Consider the 
probability of not obtaining the ith coupon after (c + l)n In n steps. This probability is 

-i \ (c+l)nlnn -. 
_M <c -(c+l)lnn = 1 

n J n c+1 

By union bound, the probability that some coupon has not been collected after (c + l)ralnn steps is less 
than l/n c . And this completes the proof of this lemma. □ 
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B Lower bound proof for the triangulation process 



Proof of Theorem |9l 

We first observe that during the triangulation process there is a time t when the number of missing edges is 
at least m = 0(y/k) and the minimum degree is at least ra/3. If A; < |n then this is true initially and for 
larger k this is true at the first time t the minimum degree is large enough. The second case follows since 
the degree of a node (and thus also the minimum degree) can at most double in each step guaranteeing that 
the minimum degree is not larger than |ra at time t also implying that at least | = fl(yk) edges are still 
missing. 

Given the bound on the minimum degree any missing edge {u, v} is added by a fixed node w with 
probability at most A. Since there are at most n — 2 such nodes the probability that a missing edge gets 
added is at most ^- . To analyze the time needed for all missing edges to be added we denote with Xi the 
random variable counting the number of steps needed until the zth of the m missing edges is added. We 
would like to analyze Pr \X\ < T, X 2 < T, . . . , X m < T] for an appropriately chosen number of steps T. 
Note that the events X{ < T and Xj < T are not independent and indeed can be positively or negatively 
correlated. Nevertheless, independent of the conditioning onto any of the events Xj < T, we have that 
Pr [X\ < T] < 1 — (1 — t^) T < 1 — for an appropriately chosen T = O(nlogm), where m is again 
the number of missing edges at time t. Thus, 

Pr [X 1 <T,X 2 <T,...,X m <T} = 

= Pr [Xi < T\X 2 < T, . . . , X m < T] • Pr [X 2 < T\X 3 , X m < T] • . . . • Pr [X m < T] 

< 1 -^) m = °(^) = °(^") 

This shows that the triangulation process takes with probability at least 1—0 ^e~ fcl/4 ^ at least 0(n log m) = 
0(n log k) steps to complete. □ 



C Proofs for the two-hop walk 



Proof of Lemma Hot 

By the definition of So, do (w) > So for all w in Nq (u). Let X be the first round at which \N^ (u) \ > So/2. 
We consider two cases. If X is at most cn log n for a constant c to be specified later, then the claim of the 
lemma holds. In the remainder of this proof we consider the case where X is greater than cn log n; thus, for 
< t < era log ra, \N? (u) | < S /2. 

Consider any node w in Nq 1 (u). Since do (w) > So and \N^ (u) | < So/2, w has at least So/2 edges 
to nodes in A^q (u). Fix a node v in A^q (u). In the following, we first show that in 0(n log n) rounds, v is 
strongly tied to the neighbors of u with probability at least 1 — 1/n 3 . Let T\ denote the first round at which 
v has is strongly tied to Nj, (u), i.e., when |iVj. (v) n Nj, (u) \ > So/4. We know that v has at least one 
neighbor, say w±, in ./Vq (u). Consider any t < T\. Since v is weakly tied to A^q (u) at time t, wi has at least 
Sq/ 4 neighbors in ./Vq (u) which do not have an edge to v at time t. This implies 

Pr \v connects to a node in Nn 1 (u) through u>i in round t] > — ■ - = — 

L " ra 4 4ra 

Let ei denote the event connects to a node in Nq (n)}, and X\ be the number of rounds for e\ to 
occur. When e\ occurs, let w 2 denote a witness for e\, i.e., w 2 is a node in N l (u) to which v con- 
nects in round X\. We note that w\,w 2 G Nq (u) C (u). If v is weakly tied to N^- (u), both 
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W\ and W2 have at least So /4 neighbors in iVjl (u) that do not have an edge to v yet. Let e 2 denote 
the event \y connects to a node in Nj Ci (it)}, and X2 be the number of rounds for e 2 to occur. Then 
Pr [e 2 ] = 2Pr[ei] > l/2n. Similarly, we define e%, X3, ^ /4 and obtain Pr [a] > i/(4n). 
We now apply Lemma |2] to obtain that X\ + X2 + . . . Xg /i is at most 16nlnn with probability at least 
1 — 1 /n 3 . Thus, with probability at least 1 — | Nq (it) | /n 3 , T\ < 16n In n. After T\ rounds, we obtain that 
for any v G A^q (it), 

5 /4 1 1 



Pr [u connects to v in a single round] > 



25q n 8n 



which implies that with probability at least 1 — 1/n 3 , u has an edge to every node in Nq (it) in another 
T2 < 24n In n rounds. 

Let T 3 equal T x + T 2 ; we set c to be at least 120 In 2 so that X > 3T 3 . We thus have N$ (it) C JV£ (it), 
iV, 3 (it) C iV-f (it)UJV| (it), and A^ (it) C M (u)llN% (u)UiV| (it). We now repeat the above analysis 
again twice and obtain that at time T = 3T 3 , ivj (u) U Jv| (u) U iVg (w) C (u) with probability at least 
1 - \N$ (u) U N$ (u) U iV^ (it) I /n 3 > 1 - 1/n 2 . By Lemma[l| we have (u)\ > min{25 , n - 1}, 
thus completing the proof of the lemma. □ 

Proof of Lemma Hit 

Let X be the first round at which iVj^ (it) < 5q/A. We consider two cases. If X is at most cn log n for a 
constant c to be specified later, then the claim of the lemma holds. In the remainder of this proof we consider 
the case where X is greater than cn log n; thus, for < t < cn log n, |iV 2 (it)| > 5q/4. If v G A^q (u) is 
strongly tied to Nq (it), then 

p r , , ■ ■ 1 hi ^ ^(^oM) 1 ^ ^o/4 1 2 

Pr it connects to v in a single round > — , — . ; - , — - • — > , , - • — = — 

1 s j_ |jvi( u )| n~ (1 + 1/8)«J n 9n 

Thus, in T = 13.5nlnn rounds, it will add an edge to v with probability at least 1 — 1/n 3 . If there are at 
least 5q/ 8 nodes in A^q (it) that are strongly tied to A^ (it), then it will add edges to all these nodes in T 
rounds with probability at least 1 — 1/n 2 . 

In the remainder of this proof, we focus on the case where the number of nodes in Nq (u) that are 
strongly tied to Nq (u) at the start of round is less than do/ 8. In this case, because A^ 2 (it)| > So/ 4, 
more than So/ 8 nodes in Nq (it) are weakly tied to A^ (u), and, thus, have at least 3<5o/4 edges to nodes in 
AT 2 (n)UiV 3 (ii). 

In the following we show u will connect to So/8 nodes in 0(n log n) rounds with probability at least 
1 — 1/n 2 . For any round t, let Wt denote the set of nodes in A^ 2 (it) that are weakly tied to A^ 1 (it). We 
refer to a length-2 path from u to a node two hops away as an out-path. Let Pq denote the set of out-paths 
to Wo- Since we have at least So/ 8 nodes in Nq (it) that are weakly tied to Nq 1 (it), |Po| is at least So/ 8 at 
time t = 0. Define e\ = {u picks an out-path in Pq and connects to node v\ in Nq (u)\, and X\ to be the 
number of rounds for e\ to occur. When < t < X%, for each Wi G A^ 1 (it), let fi be the number of edges 
from Wi to nodes in N/ (it) U A^ 2 (it), and pi be the number of edges from W{ to nodes in Nq (u) that are 
weakly tied to Nq (u). 



d t {u) h ~ ^ d t {u) n-1 (l + l/8)<y (n-l) 

\S\ > Sq/8 > 1 



(l + l/8)<5 (n-l) ~ (l + l/8)<J (n-l) ~ 9n' 

After X\ rounds, it will pick an out-path in Po and connect such a v\. Define P\ to be a set of out-paths from 
it to Wxi- We now place a lower bound on |Pi\Pq|- Since v\ G A^q (u) is added to Njr (it), those out-paths 
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in Pq consisting of edges from v\ to nodes in Nq (u) are not in P\. The number of out-paths we lose because 
of this is at most <5o/4. But v\ also has at least 35o/4 edges to Nq (u) U Nq (u). The end points of these 
edges are in (u) U Nj Ci (u). If more than Sq/8 of them are in Nj (i (u), then dx 1 {u) > (1 + l/8)<5o- 
Now let's consider the case that less than So/8 such end points are in (u). This means the number of 
edges from v\ to N\ (u) is at least 3So/4 — Sq/4 — Sq/8 = 3Sq/8. Among the end points of these edges, if 
more than So/8 of them are strongly tied to (u), then the degree of u will become at least (1 + l/8)<5o 
in O(nlogn) rounds with probability 1 — 1/n by our earlier argument. If not, we know that more than 
Sq/ '4 newly added edges are pointing to nodes that are weakly tied to (u). Thus, |Pi \ Pq\ is by at 
least So/ A. \S\ > 2 • Sq/8. Define e 2 = {u picks an out-path in Pi and connects to node i^}, and X2 to 
be the number of rounds for e2 to occur. During time X\ < t < X2, Pr [e2\ is at least 2 • i. Similarly, 
we define e$, X3, . . . , eg /s, ^<5 /8 an d derive Pr [e^] > i/(9n). By Lemma [2j the number of rounds for 
dt 0) > (1 + l/8)<5 is bounded by 

T = Xi +X 2 H \-X So/8 < (2 + l)9nlnn = 27nlnn 

with probability at least 1 — 1/n 2 , completing the proof of this lemma. □ 

Proof of Theorem 

We first show that in time T = O(nlogn) time, the minimum degree of the graph increases by a factor 
of 1/8, i.e., St > min{(l + l/8)<5o, n — 1}. Then we can apply this argument O(logn) times, and thus, 
complete the proof of this theorem. 

For each u where do (u) < min{(l + l/8)So,n — 1}, we analyze by the following 2 cases. First, 



if \Nq (u) I > So/2, by Lemma 11 we know as long as \Nf (u) \ > 5q/4 for all t > 0, dr {u) > 



min{(l + 1/8)^0,^ — 1} with probability 1 — 1/n 2 where T = 0(n log n). Whenever the condition is 
not satisfied, we know at least <5o/4 nodes in iV, 2 (u) has been moved to N T (u), which means dx (u) > 
min{(l + l/4)(5 ,n - 1}. 



Second, if \Nq (u) \ < So/ 2, by Lemma 10 we know as long as |iV t (u) | < Sq/2 for all t > 0, 



dr (u) > min{(l + 1/8)50)^ — 1} with probability 1 — 1/n 2 where T = O(nlogn). Whenever the 
condition is not satisfied, we are back to the analysis in the first case, and the minimum degree will become 
min{(l + l/8)(5o,n — 1} with probability 1 — 1/n 2 . 

Combining the the above two cases we get that with probability 1 — 1/n the minimum degree of G 
will become at least min {(1 + l/8)5o, n — 1} in 0(n log n) rounds, since there are at most n nodes whose 
degree is between So and min {(1 + l/8)5o, n — 1}, 

Now we can apply the above argument O(logn) times, and have shown the two-hop walk process 
completes in 0(n log 2 n) with high probability. □ 



D Proofs for the two-hop walk in directed graphs 



Proof of Theorem HH 

Consider any pair of nodes, u and v. Consider a shortest path from u to v (vo, v\, V2, ■ ■ ■ , v m ), where = u, 
v m = v and m < n. Fix a time step t. Let denote the event an edge is added from V{ to vi + 2 in step t. 
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The probability of occurrence of ej is Pr [ej] > 1/n 2 . All the e^'s are independent from one another. 

i i j 

= E Pr N-EE Pr NPr[e,] 

i i j 

1 . . 1 

> m—^—rn{m — \)—r 
n l re 4 



Let X\ denote the number of steps it takes for the length of the above path to decrease by 1. It is clear that 
M-Xi] < n 2 /m. In general, let X{ denote the number of steps it takes for the length of the above path to 
decrease by i. By Lemma [2] the number of steps it takes for the above path to shrink to an edge is at most 
4n 2 In n with probability 1/n 3 . Taking a union bound over all the edges yields the desired upper bound. 
For the lower bound, consider a graph Go with the node set {1, 2, . . . , re} and the edge set 

{(Si,j),(Si + 1, j) : < i < re/4,3n/4 < j < n}(J{(3i,3i+ 1), (3» + 1,3» + 2) : < % < n/4}. 

The only edges that need to be added by the two-hop process are the edges (Si, 3i + 2) for < i < n/ 4. The 
probability that node 3i adds the edge (3i, Si + 2) in any round is at most 16/ra 2 . The probability that edge 
(Si, Si + 2) is not added in (n 2 lnn)/32 rounds is at least l/^/n. Since the events associated with adding 
each of these edges are independent, the probability that all the n/S edges are added in (re 2 In re) / 32 rounds 
is at most (1 — l/^/n) n / 3 < e - ^/ 3 , completing the lower bound proof. □ 

Proof of Theorem [I5t 

The graph Go = (V, E) is depicted in Figure [4] and formally defined as Go = (V, E) where V = 
{1,2, ... ,n} with n being even, and 

E = {(i,j) : 1 < i,j < n/2} U {(i,i+ 1) : re/2 < i < n} U {(i,j) : i > j,i > n/2,i,j G V} . 

We first establish an upper bound on the probability that edge (i, i + h) is added by the start of round t, for 



Clique 




Figure 4: Lower bound example for two-hop walk process in directed graphs 

given i, 1 < i < n — h. Let t denote this probability. The following base cases are immediate: p^o is 1 
for h = 1 and h < 0, and otherwise. Next, the edge (i, i + h) is in Gt+i if and only if (i, i + h) is either 
in Gt-i or added in round t. In the latter case, (i, i + h) is added by a two-hop walk i — > i + k — > i + h, 
where —i<k<n — i. Since the out-degree of every node is at least re/2, for any k the probability that i 
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takes such a walk is at most 4/n 2 . 



4 n ~ l 



Ph,t+1 < Ph,t + -3 E PMP>» 



fe.t 
fe>— i 

f i— 1 ft— 1 re— i 



= Pfc,t + A ( y^Pfe+fc,t + y^,Pk,tPh-k,t + (i) 

71 Vfe=l k=l k=h+l / 

We show by induction on t that 

(at \ ^~ ^ 
a - 2 ) ,foralU<en 2 (2) 

where a and e are positive constants that are specified later. 

The induction base is immediate. For the induction step, we use the induction hypothesis for t and 
Equation [T] and bound Ph,t+i as follows. 

'i— 1 / ,\ h+k-l ft— 1 / ,\ fe-1 / ,\ h-k—1 n—i , ,% fc— 1\ 

+E(1) fll + E (1) J 

>fc=l v 7 k=l v 7 v 7 fc=h+l v 7 / 

ft r, \ 



Ph,t+1 



< 



< ^ +^ h-1) ^ + 




< + h-1 ^ -^ 4 + 



< 



•1) 


(1 










ft-2 


1 


n 2 , 


) 


/I 2 


crt N 


ft-2 


Q; 


n 2 , 




n 2 



at 

n 2 J 1 — at/n 2 
4e 2 



(1 - ae) 



< 

(In the second inequality, we combine the first and third summations and bound them by their infinite 
sums. In the third inequality, we use t < en 2 . For the fourth inequality, we set a sufficiently large so that 
a>4 + 4/(l — ae). The final inequality follows from Taylor series expansion.) 

For an integer x, let C x denote the cut ({u : u < x}, {v, v > x}). We say that a cut C x is untouched at 
the start of round t if the only edge in Gt crossing the cut C x is the edge (x,x + 1); otherwise, we say C x is 
touched. Let X denote the smallest integer such that Cx is untouched. We note that X is a random variable 
that also varies with time. Initially, X = n/2. 

We divide the analysis into several phases, numbered from 0. A phase ends when X changes. Let Xi 
denote the value of X at the start of phase i; thus Xq = n/2. Let T\ denote the number of rounds in phase 
i. A new edge is added to the cut Cx 4 only if either Xi selects edge (Xi,Xi + 1) as its first hop or a node 
u < Xi selects u — > Xi — > Xi + 1. Since the degree of every node is at least n/2, the probability that a new 
edge is added to the cut C, is at most 2/n + n(4/n 2 ) = 6/n, implying that E\Ti] > n/6. 

We now place a bound on Xi+\. Fix a round t < en 2 , and let E x denote the event that C x is touched by 
round t. We first place an upper bound on the probability of E x for arbitrary x using Equation [2] 

'at\ h ~ l at{4-3(at)/n 2 + (at) 2 /n A ) 



, , n 2 (l - (at)/n 2 ) 3 

h>2 v / V V " ' 

for t < en 2 , where we use the inequality Ylh>2 h 2 $ h = 8(4 - 35 + 5 2 )/(l - 5) 3 for < 5 < 1. We set e 
sufficiently small so that (4 — 3e + e 2 )/(l — e) 3 < 5, implying that the above probability is at most be. 
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If E x were independent from E y for x / y, then we can invoke a straightforward analysis using a 
geometric probability distribution to argue that E[Xi + \ — Xi] is at most 1/(1 — 5e) = 0(1); to see this, 
observe that Xi + \ — Xi is stochastically dominated by the number of tosses of a biased coin needed to get 
one head, where the probability of tail is 5e. The preceding independence does not hold, however; in fact, 
for y > x, Pt[E v mod E x \ > Pr[£^]. We show that the impact of this correlation is very small when x and 
y are sufficiently far apart. We consider a sequence of cuts C Xl , C X2 , . . . , C Xl , . . . where xg = x^_i + ci, 
for a constant c chosen sufficiently large, and we set xq = X{ + 2. We bound the conditional probability of 
E xt given E Xe _ 1 n E X( _ 2 ■ ■ ■ D E Xl as follows. 

^[E Xe \E Xe _ 1 n E X( _ 2 ■ ■ ■ E Xl \ 

~Pr[E Xe n E Xe _ 1 n E Xe _ 2 ■ ■ ■ E X1 ] 



< P4^_i n e xi _ 2 ■■■E Xl n (c Xe n (cy, u • • • u c Xl ) = 0)] | 

Pr Pi E Xl _ 2 ■ ■ ■ E Xl ] 



< 



P r [E Xe _ 1 n E X( _ 2 ■ ■ ■ E X1 \ 
\Exe_-L n E Xl _ 2 ■■■E Xl n{ 

Prt^n 

Pr^,, n • • • E X1 n (c a , n (C Xt _ x u • • • u c xi ) ± 0) 

Pr[E Xe _ 1 n i^x^a ' ' ' Exi] P r [ a new e dge is added from (x£-i + l,xg) to (x£ + 1, n]] 

p~4E^n^~^E^] 

Pr[an edge spanning at least ct hops is added across C xe ] 
Pr[-&c £ _i fl E Xf _ 2 ■ ■ ■ E X1 ] 

((qQ/n 2 )^- 1 
" [ xel+ (l-at/n2)2( t /„2)£ 

< 5e + e = 6e, 

where we set c sufficiently large in the last step. Since is at most the smallest such that C x< , is 
untouched, we obtain that 

E[X l+1 - Xi] < 2 + ^(6e) £ c^ 2 < c', 

for a constant d chosen sufficiently large. We thus obtain that after e'n phases, E[X] is at most e'c'n, where 
e' is chosen sufficiently small so that n — E[X] is Q(n). Since the expected length of each phase is at least 
n/6, it follows that the expected number of rounds it takes for the two-hop process to complete is S7(n 2 ) 
rounds. 

□ 
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